I have run a series of univariate mixed models on some DNA methylation data. The models were looking at the whether DNA methylation at a group of 156 CpG sites was associated with colorectal cancer. The outcome variable was the methylation proportion at the CpG site, and the independent variable of interest was cancer case/control
I ran six different specifications of the model, adjusting for various different combinations of covariates. So I have a total of 6 * 156 = 1,092 results.
The output of the linear mixed models is saved in an RDS file called “lmm_results_combined.rds”. There are 936 rows, each one representing the output of a single univariate model on a CpG. The variables should be fairly self explanatory: they show:
I want to create volcano plots for the outcomes of my models. Volcano plots are the standard way of visualising the results of a series of univariate tests. They show the beta coefficient of the variable of interest on the X-axis, and the negative log10 of the p-value on the Y-axis. -log10 is used to make the visualisation easier: the higher the data point is on the y-axis, the more significant it is.
To add an extra layer of complication, I want to create a separate volcano plot for each different model.
TL;DR Create 6 volcano plots to visualise the results of each batch of models.
First, let’s load the data and create our plot axes.
library(ggplot2)
library(plotly)
### clear your environment
rm(list=ls())
### Load the data
results <- readRDS("lmm_results_combined.rds")
plt <- ggplot(data=results, mapping = aes(x=coef, y=pval))
plt
Note though that the y axis is on the wrong scale. We want -log10 scale for the p-values, so that higher points on the y-axis are more significant. This can be done directly within the ggplot - you don’t need to edit the data in results.
ggplot(data=results, mapping = aes(x=coef, y=-log10(pval))) # data and aesthetics
Now we can add the geometric objects - our data points. A volcano plot is just a type of scatter plot, so we want geom_point. You can pass other parameters into the geom object, including ‘alpha’, a value between zero and one that changes the transparency of the geometric object
ggplot(data = results, mapping = aes(x = coef, y = -log10(pval))) +
geom_point(size=0.8) # add geom_point for scatter plot data points
We might also want to add lines to show the p=0.05 and p=0.05 (Bonferroni adjusted) thresholds. These lines are just annotations that can be manually added as another geom object.
### calculate bonferroni adjusted pval:
### 0.05 divided by the number of tests run for any given model
bonf.pval <- round(0.05 / sum(results$model == "model1"),5)
ggplot(data = results, mapping = aes(x = coef, y = -log10(pval))) +
geom_point(size=0.8) +
geom_hline(linetype="dotted", yintercept=-log10(0.05), colour= "grey40") + # add dashed lne for pvalue
geom_hline(linetype="dotted", yintercept=-log10(bonf.pval)) # add dashed lne for Bonferroni pvalue
We now want to label the dashed lines. This can be done by adding an annotation layer. Annotation layers are not mapped from variables in the data - you pass in the information manually.
You have to specify the exact position of the annotation on the plot using x and y coordinates. You can also change the text size and colour in the parameters inside the annotation object.
ggplot(data = results, mapping = aes(x = coef, y = -log10(pval))) +
geom_point(size=0.8) +
geom_hline(linetype="dotted", yintercept=-log10(0.05)) +
geom_hline(linetype="dotted", yintercept=-log10(bonf.pval), col = "grey60") +
annotate("text", x = -0.22, y = -log10(0.032), label = "p=0.05",
size = 2, col = "grey30") + # annotate pvalue line
annotate("text", x = -0.22, y = -log10(bonf.pval - 0.0001), label = paste("p=",bonf.pval),
size = 2, col = "grey30") # annotate pvalue line (adjusted)
No statistical manipulation is essential for a volcano plot.
Currently, the plot is showing all the data from every model on one plot, which doesn’t make sense. We want a separate plot for every model. This can be easily achieved with facets.
ggplot(data = results, mapping = aes(x = coef, y = -log10(pval))) +
geom_point(size=0.8) +
geom_hline(linetype="dotted", yintercept=-log10(0.05), col = "grey60") +
geom_hline(linetype="dotted", yintercept=-log10(bonf.pval), col = "grey60") +
annotate("text", x = -0.22, y = -log10(0.032), label = "p=0.05",
size = 2, col = "grey30") +
annotate("text", x = -0.22, y = -log10(bonf.pval - 0.0001), label = paste("p=",bonf.pval),
size = 2, col = "grey30")+
facet_wrap(.~model) # create plot for each different facet (model)
Now we want to add colour to indicate the significance of the model results. We may also want to change the overall visual appearance of the plot.
We can add colour by amending ‘col’ in the aesthetic mapping.
ggplot(data = results, mapping = aes(x = coef, y = -log10(pval),
col=significance)) + # add colour, contingent on another variable
geom_point(size=0.8) +
geom_hline(linetype="dotted", yintercept=-log10(0.05), col = "grey60") +
geom_hline(linetype="dotted", yintercept=-log10(bonf.pval), col = "grey60") +
annotate("text", x = -0.22, y = -log10(0.032), label = "p=0.05",
size = 2, col = "grey30") +
annotate("text", x = -0.22, y = -log10(bonf.pval - 0.0001), label = paste("p=",bonf.pval),
size = 2, col = "grey30")+
facet_wrap(.~model)
This is fine but the colours aren’t very appealing. They can be amended using scale_colour_manual
ggplot(data = results, mapping = aes(x = coef, y = -log10(pval),
col=significance)) +
geom_point(size=0.8) +
geom_hline(linetype="dotted", yintercept=-log10(0.05), col = "grey60") +
geom_hline(linetype="dotted", yintercept=-log10(bonf.pval), col = "grey60") +
annotate("text", x = -0.22, y = -log10(0.032), label = "p=0.05",
size = 2, col = "grey30") +
annotate("text", x = -0.22, y = -log10(bonf.pval - 0.0001), label = paste("p=",bonf.pval),
size = 2, col = "grey30")+
facet_wrap(.~model) +
scale_colour_manual(name = "Significance", values = c("grey60","darkorange1", "firebrick2"),
labels = levels(results$significance), drop = FALSE) # change the colours of the data points
I want to change the overall theme of the plot. You need the ggthemes package installed.
library(ggthemes)
ggplot(data = results, mapping = aes(x = coef, y = -log10(pval),
col=significance)) +
geom_point(size=0.8) +
geom_hline(linetype="dotted", yintercept=-log10(0.05), col = "grey60") +
geom_hline(linetype="dotted", yintercept=-log10(bonf.pval), col = "grey60") +
annotate("text", x = -0.22, y = -log10(0.032), label = "p=0.05",
size = 2, col = "grey30") +
annotate("text", x = -0.22, y = -log10(bonf.pval - 0.0001), label = paste("p=",bonf.pval),
size = 2, col = "grey30")+
facet_wrap(.~model) +
scale_colour_manual(name = "Significance", values = c("grey60","darkorange1", "firebrick2"),
labels = levels(results$significance), drop = FALSE) +
theme_few() # change theme
We can use xlab(), ylab() and ggtitle() to change axes titles and the figure title.
ggplot(data = results, mapping = aes(x = coef, y = -log10(pval),
col=significance)) +
geom_point(size=0.8) +
geom_hline(linetype="dotted", yintercept=-log10(0.05), col = "grey60") +
geom_hline(linetype="dotted", yintercept=-log10(bonf.pval), col = "grey60") +
annotate("text", x = -0.22, y = -log10(0.032), label = "p=0.05",
size = 2, col = "grey30") +
annotate("text", x = -0.22, y = -log10(bonf.pval - 0.0001), label = paste("p=",bonf.pval),
size = 2, col = "grey30")+
facet_wrap(.~model) +
scale_colour_manual(name = "Significance", values = c("grey60","darkorange1", "firebrick2"),
labels = levels(results$significance), drop = FALSE) +
theme_few() +
xlab("Beta coefficient") + ylab("-log10 p-value") + # add xlabels and titles
ggtitle("Results of univariate linear mixed modelling")
p <- ggplot(data = results, mapping = aes(x = coef, y = -log10(pval),
col=significance)) +
geom_point(size=0.8) +
geom_hline(linetype="dotted", yintercept=-log10(0.05), col = "grey60") +
geom_hline(linetype="dotted", yintercept=-log10(bonf.pval), col = "grey60") +
annotate("text", x = -0.22, y = -log10(0.032), label = "p=0.05",
size = 2, col = "grey30") +
annotate("text", x = -0.22, y = -log10(bonf.pval - 0.0001), label = paste("p=",bonf.pval),
size = 2, col = "grey30")+
facet_wrap(.~model) +
scale_colour_manual(name = "Significance", values = c("grey60","darkorange1", "firebrick2"),
labels = levels(results$significance), drop = FALSE) +
theme_few() +
xlab("Beta coefficient") + ylab("-log10 p-value") + # add xlabels and titles
ggtitle("Results of univariate linear mixed modelling")
### Save the plot
# We can use ggsave to save the plot
ggsave("volcano_plots.png", plot = p, device = NULL, path = NULL,
scale = 1, width = 330, height = 250, units = "mm",
dpi = 300, limitsize = TRUE)
Plotly has a whole grammar of its own, but we can easily take advantage of some of the functionality by feeding a ggplot object into the ggplotly() function from the plotly package. This automatically adds interactive elements to the plot.
p <- ggplot(data = results, mapping = aes(x = coef, y = -log10(pval),
col=significance)) +
geom_point(size=0.8) +
geom_hline(linetype="dotted", yintercept=-log10(0.05), col = "grey60") +
geom_hline(linetype="dotted", yintercept=-log10(bonf.pval), col = "grey60") +
annotate("text", x = -0.22, y = -log10(0.032), label = "p=0.05",
size = 2, col = "grey30") +
annotate("text", x = -0.22, y = -log10(bonf.pval - 0.0001), label = paste("p=",bonf.pval),
size = 2, col = "grey30")+
facet_wrap(.~model) +
scale_colour_manual(name = "Significance", values = c("grey60","darkorange1", "firebrick2"),
labels = levels(results$significance), drop = FALSE) +
theme_few() +
xlab("Beta coefficient") + ylab("-log10 p-value") +
ggtitle("Results of univariate linear mixed modelling")
### pass the ggplot object into the ggplotly function
ggplotly(p)
### save html object
ggp <- ggplotly(p)
library(htmlwidgets)
htmlwidgets::saveWidget(ggp, "interactive_volcano_plot.html")